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Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to 
exploit due to its irregularity. In this article, we introduce a new code-scheduling 
technique for irregular ILP called "selective scheduling" which can be used as a 
component for superscalar and VLIW compilers. Selective scheduling can compute a wide 
set of independent operations across all execution paths based on renaming and forward- 
substitution and can compute availab ... 
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This paper is a scientific comparison of two code generation techniques with identical 
goals — generation of the best possible software pipelined code for computers with 
instruction level parallelism. Both are variants of modulo scheduling, a framework for 
generation of software pipelines pioneered by Rau and Glaser [RaG181], but are 
otherwise quite dissimilar.One technique was developed at Silicon Graphics and is used in 
the MlPSpro compiler. This is the production compiler for SGI's s ... 
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VLIW processors 
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Publisher: ACM Press 

Full text available- fSpdf(216 58 KB) Additional Information: full citation , abstract , references , citings , index 

! terms 

Phase-decoupled methods for code generation are the state of the art in compilers for 
standard processors but generally produce code of poor quality for irregular target 
architectures such as many DSPs. In that case, the generation of efficient code requires 
the simultaneous solution of the main subproblems instruction selection, instruction 
scheduling, and register allocation, as an integrated optimization problem. 

In contrast to compilers for standard processors, code generation for ... 

Keywords: dynamic programming, instruction scheduling, instruction selection, 
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Full text available: ^ pdf(4.72 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Utilizing parallelism at the instruction level is an important way to improve performance. 
Because the time spent in loop execution dominates total execution time, a large body of 
optimizations focuses on decreasing the time to execute each iteration. Software 
pipelining is a technique that reforms the loop so that a faster execution rate is realized. 
Iterations are executed in overlapped fashion to increase parallelism. Let {ABC}n 
Keywords: instruction level parallelism, loop reconstruction, optimization, software 
pipelining , 



Accurate and efficient predicate analysis with binary decision diagrams 
John W. Sias, Wen-Mei W. Hwu, David I. August 
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Publisher: ACM Press 

Full text available* fj3 pdf(1.63 MB) Additional Information: full citation , abstract , references , citings , index 
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The foremost goal of superscalar processor design is to increase performance through the 
exploitation of instruction-level parallelism (ILP). Previous studies have shown that 
speculative execution is required for high instruction per cycle (IPC) rates in non- 
numerical applications. The general trend has been toward supporting speculative 
execution in complicated, dynamically-scheduled processors. Performance, though, is 
more than just a high IPC rate; it also depends upon instruction count ... 

10 VLIW compilation techniques in a superscalar environment 

Kemal Ebcioglu, Randy D. Groves, Ki-Chang Kim, Gabriel M. Silberman, Isaac Ziv 
June 1994 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1994 conference 
on Programming language design and implementation PLDI '94, volume 29 
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Publisher: ACM Press 

Full text available- fi3 pdf(1.30 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

We describe techniques for converting the intermediate code representation of a given 
program, as generated by a modern compiler, to another representation which produces 
the same run-time results, but can run faster on a superscalar machine. The algorithms, 
based on novel parallelization techniques for Very Long Instruction Word (VLIW) 
architectures, find and place together independently executable operations that may be 
far apart in the original code, i.e., they may be se ... 

Keywords: VUW, compiler optimizations, global scheduling, profiling directed feedback, 
software pipelining, superscalars 
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Randi Rost 

August 2004 Proceedings of the conference on SIGGRAPH 2004 course notes GRAPH 
•04 

Publisher: ACM Press 

Full text available: Qpdf(7.39 MB) Additional Information: full citation , abstract 

Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 
of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
redesigned to address today's real-time shading capabili ... 
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June 1999 Proceedings of the ACM 1999 conference on Java Grande 
Publisher: ACM Press 
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One of the major challenges in designing optimizing compilers, especially for scientific 
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computation, is to take advantage of the parallelism in loops in order to obtain maximum 
speedup on parallel computer architectures. Optimal loop scheduling is therefore one of 
the most important topics studied by many computer scientists. However, how to allocate 
a minimum number of registers to support optimal loop scheduling for parallel 
architectures is less understood. In this report, we propose a simul ... 

16 Space-time scheduling of instruction-level parallelism on a raw machine 
Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek 
Sarkar, Saman Amarasinghe 

October 1998 ACM SIGPLAN Notices , ACM SIGOPS Operating Systems Review , 

Proceedings of the eighth international conference on Architectural 
support for programming languages and operating systems ASPLOS- 

VIII, Volume 33 , 32 Issue 11,5 

Publisher: ACM Press 

Full text available- 1§) pdf(1 79 MB) Additional Information: full citation, abstract, references, citings, index 
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Increasing demand for both greater parallelism and faster clocks dictate that future 
generation architectures will need to decentralize their resources and eliminate primitives 
that require single cycle global communication. A Raw microprocessor distributes all of its 
resources, including instruction streams, register files, memory ports, and ALUs, over a 
pipelined two-dimensional mesh interconnect, and exposes them fully to the compiler. 
Because communication in Raw machines is distributed, com ... 

17 Parallel processing: a smart compiler and a dumb machine 
Joseph A. Fisher, John R. Ellis, John C. Ruttenberg, Alexandru Nicolau 
June 1984 ACM SIGPLAN Notices , Proceedings of the 1984 SIGPLAN symposium on 

Compiler construction SIGPLAN '84, volume 19 issue 6 
Publisher: ACM Press 

Full text available: ^j [pdf(1.05 MB) Additional Information: full citation , abstract , references , citings 

Multiprocessors and vector machines, the only successful parallel architectures, have 
coarse-grained parallelism that is hard for compilers to take advantage of. We've 
developed a new fine-grained parallel architecture and a compiler that together offer 
order-of-magnitude speedups for ordinary scientific code. 

18 Fusion-based register allocation 
Guei-Yuan Lueh, Thomas Gross, Ali-Reza Adl-Tabatabai 

May 2000 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 22 Issue 3 
Publisher: ACM Press 

Full text available: 1 S) pdf(475.45 KB) Additional Information: full citation, abstract, references, citings, index 
. y^M—x terms , review 

The register allocation phase of a compiler maps live ranges of a program to registers. If 
there are more candidates than there are physical registers, the register allocator must 
spill a live range (the home location is in memory) or split a live range (the live range 
occupies multiple locations). One of the challenges for a register allocator is to deal with 
spilling and splitting together. Fusion-based register allocation uses the structure of the 
program to make splitting and spilling d ... 

Keywords: performance evaluation, register allocation 
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Speculative execution is an important source of parallelism for VLIW and superscalar 
processors. A serious challenge with compiler-controlled speculative execution is to 
efficiently handle exceptions for speculative instructions. In this article, a set of 
architectural features and compile-time scheduling support collectively referred to as 
sentinel scheduling is introduced. Sentinel scheduling provides an effective framework for 
both compiler-controlled speculative executi ... 

Keywords: VIIW processor, exception detection, exception recovery, instruction 
scheduling, instruction-level parallelism, speculative execution, superscalar processor 
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This paper describes RAP, a Register Allocator that allocates registers over the Program 
Dependence Graph (PDG) representation of a program in a hierarchical manner. The PDG 
program representation has been used successfully for scalar optimizations, the detection 
and improvement of parallelism for vector machines, multiple processor machines, and 
machines that exhibit instruction level parallelism, as well as debugging, the integration of 
different versions of a program, and translation of ... 
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